Using the Output Embedding to Improve Language Models

نویسندگان

  • Ofir Press
  • Lior Wolf
چکیده

We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied model. We also offer a new method of regularizing the output embedding. Our methods lead to a significant reduction in perplexity, as we are able to show on a variety of neural network language models. Finally, we show that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Slim Embedding Layers for Recurrent Neural Language Models

Recurrent neural language models are the state-of-the-art models for language modeling. When the vocabulary size is large, the space taken to store the model parameters becomes the bottleneck for the use of recurrent neural language models. In this paper, we introduce a simple space compression method that randomly shares the structured parameters at both the input and output embedding layers o...

متن کامل

The Optimal Steering Control System using Imperialist Competitive Algorithm on Vehicles with Steer-by-Wire System

Steer-by-wire is the electrical steering systems on vehicles that are expected with the development of an optimal control system can improve the dynamic performance of the vehicle. This paper aims to optimize the control systems, namely Fuzzy Logic Control (FLC) and the Proportional, Integral and Derivative (PID) control on the vehicle steering system using Imperialist Competitive Algorithm (IC...

متن کامل

Link Prediction using Network Embedding based on Global Similarity

Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link predicti...

متن کامل

Phishing website detection using weighted feature line embedding

The aim of phishing is tracing the users' s private information without their permission by designing a new website which mimics the trusted website. The specialists of information technology do not agree on a unique definition for the discriminative features that characterizes the phishing websites. Therefore, the number of reliable training samples in phishing detection problems is limited. M...

متن کامل

Efficient DMUs improvement based on input expenses reduction using data envelopment analysis

Network nowadays, the main purpose in the models designed by Data Envelopment Analysis (DEA), is to improve the outputs. In this method which is expressed by Khodabakhshi, with an output oriented BCC model, the output increases when the input increases. In this article we will discuss the efficient Decision Making Units (DMUs) in the input oriented BCC model to reduce the input expenses signifi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017